Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance #4655

yamazakimitsufumi · 2024-04-18T09:34:56Z

This pull request proposes a fix to the issue #4644 .

The currently implemented 2D thread distribution in level3_thread.c works well for small matrices, however, it falls into a simple one-dimensional distribution in the M direction as the size of matirix becomes larger. This pull request improves thread parallell performance for large matrices by expanding the scope of 2D thread distribution.

Performance improved by about 10% on Graviton3E (64 cores) and more than 20% on Xeon Platinum 8375C (32 cores x 2 sockets).

Graviton3E

Xeon Platinum 8375C

The calculations are distributed so that each thread handles about the same size of the range in the M and N directions, even when the input matrices are rectangle. Although not confirmed on all platforms, this relatively simple fix is expected to be generally effective on modern manycore CPUs.

martin-frbg · 2024-04-18T11:22:25Z

That's impressively elegant and effective.

rageshhajela16 · 2024-05-13T21:26:54Z

@martin-frbg Thanks for the review. Are there any other comments for us to incorporate? CI failures seem to be not related to the code changes, might be environment issues, if you can please help to review and confirm. Thanks
cc: @yamazakimitsufumi

Expanding the scop of 2D thread distribution

51ab190

martin-frbg added this to the 0.3.28 milestone Apr 18, 2024

martin-frbg merged commit 6ca9ffa into OpenMathLib:develop May 14, 2024
67 of 70 checks passed

martin-frbg mentioned this pull request Jun 25, 2024

Segfault in sgemm when used in open-webui #4766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance #4655

Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance #4655

yamazakimitsufumi commented Apr 18, 2024

martin-frbg commented Apr 18, 2024

rageshhajela16 commented May 13, 2024

Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance #4655

Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance #4655

Conversation

yamazakimitsufumi commented Apr 18, 2024

martin-frbg commented Apr 18, 2024

rageshhajela16 commented May 13, 2024